Semi-supervised learning for Machine Translation

نویسندگان

  • Nicola Ueffing
  • Gholamreza Haffari
  • Anoop Sarkar
چکیده

Statistical machine translation systems are usually trained on large amounts of bilingual text which is used to learn a translation model, and also large amounts of monolingual text in the target language used to train a language model. In this chapter we explore the use of semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality. In particular, in this work we use monolingual source language data from the same domain as the test set (without directly using the test set itself) and use semisupervised methods for model adaptation to the test set domain. We propose several algorithms with this aim, and present the strengths and weaknesses of each one. We present detailed experimental evaluations using French–English and Chinese–English data and show that under some settings translation quality can be improved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A semi-supervised learning approach for morpheme segmentation for an Arabic dialect

We present a semi-supervised learning approach which utilizes a heuristic model for learning morpheme segmentation for Arabic dialects. We evaluate our approach by applying morpheme segmentation to the training data of a statistical machine translation (SMT) system. Experiments show that our approach is less sensitive to the availability of annotated stems than a previous rule-based approach an...

متن کامل

Graph-based Learning for Statistical Machine Translation

Current phrase-based statistical machine translation systems process each test sentence in isolation and do not enforce global consistency constraints, even though the test data is often internally consistent with respect to topic or style. We propose a new consistency model for machine translation in the form of a graph-based semi-supervised learning algorithm that exploits similarities betwee...

متن کامل

Improved Arabic Dialect Classification with Social Media Data

Arabic dialect classification has been an important and challenging problem for Arabic language processing, especially for social media text analysis and machine translation. In this paper we propose an approach to improving Arabic dialect classification with semi-supervised learning: multiple classifiers are trained with weakly supervised, strongly supervised, and unsupervised data. Their comb...

متن کامل

Active Semi-Supervised Learning for Improving Word Alignment

Word alignment models form an important part of building statistical machine translation systems. Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial alignments acquired from humans. Such dedicated elicitation effort is often expensive and depends on availability of bilingual speakers for the language-pair. In this paper we st...

متن کامل

Transductive learning for statistical machine translation

Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text in the target language. In this paper we explore the use of transductive semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality. We propose several algorithms with this aim, and present the strengths and w...

متن کامل

Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machine Translation

In this paper, instead of designing new features based on intuition, linguistic knowledge and domain, we learn some new and effective features using the deep autoencoder (DAE) paradigm for phrase-based translation model. Using the unsupervised pre-trained deep belief net (DBN) to initialize DAE’s parameters and using the input original phrase features as a teacher for semi-supervised fine-tunin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008